Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
¼öÄ¡ µ¥ÀÌÅÍ ¼¼Æ®¿¡¼ Tomek Links ¹æ¹ý°ú Balancing GANÀ» °áÇÕÇÑ ºÒ±ÕÇü µ¥ÀÌÅÍ ¹®Á¦ °³¼± ±â¼ú |
¿µ¹®Á¦¸ñ(English Title) |
A Study on Development of Technology to Improve Imbalanced Data Problems in Numerical Dataset Using Tomek Links Method combined with Balancing GAN |
ÀúÀÚ(Author) |
³ªÇö½Ä
¹Ú¼ÒÈñ
ÃÖ´ë¼±
Hyunsik Na
Sohee Park
Daeseon Choi
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 47 NO. 10 PP. 0974 ~ 0984 (2020. 10) |
Çѱ۳»¿ë (Korean Abstract) |
¸Ó½Å·¯´×Àº µ¥ÀÌÅÍ ºÐ·ù, À½¼ºÀνÄ, ¿¹Ãø ¸ðµ¨ µî ´Ù¾çÇÑ ÀÀ¿ë ºÐ¾ß¿¡¼ ÁÁÀº ¼º´ÉÀ¸·Î À¯¿ëÇÏ °Ô ÀÌ¿ëµÇ°í ÀÖ´Ù. ÇÏÁö¸¸ ÇнÀ µ¥ÀÌÅÍ ¼¼Æ®ÀÇ Å¬·¡½º °£ ºÒ±ÕÇüÀ¸·Î ÀÎÇØ ¼Ò¼ö Ŭ·¡½º¿¡ ´ëÇÑ ¸ðµ¨ÀÇ ¼º´ÉÀÌ ÀúÇϵǴ ¹®Á¦°¡ ÀÖ´Ù. º» ³í¹®¿¡¼´Â ºÒ±ÕÇü µ¥ÀÌÅÍ ¹®Á¦¸¦ ÇØ°áÇÏ°í ¸íÈ®ÇÑ °áÁ¤ °æ°è¸¦ ã±â À§ÇØ Balancing GAN°ú Tomek Links ¹æ¹ýÀ» °áÇÕÇÑ »õ·Î¿î µ¥ÀÌÅÍ ÁõÆø ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. ±×¸®°í Á¦¾ÈµÈ ¹æ¹ýÀ» °ËÁõÇϱâ À§ÇØ 5°³ÀÇ µ¥ÀÌÅÍ ¼¼Æ®¸¦ »ç¿ëÇÏ¿© ºÐ·ù ¸ðµ¨¿¡ µû¸¥ Á¦¾È ¹æ¹ýÀÇ ¼º´ÉÀ» Æò°¡ÇÏ°í, µ¥ÀÌÅÍ »ùÇøµ°ú GAN ±â¹ÝÀÇ µ¥ÀÌÅÍ ÁõÆø ±â¹ýµé°úÀÇ ¼º´ÉÀ» ºñ±³ÇÏ¿´´Ù. ±× °á°ú ÃÑ 25°³ÀÇ ¼º´É Æò°¡ Áß 17°³¿¡¼ ºÐ·ù ¼º´ÉÀÌ 0.05¢¦0.195 ¸¸Å °³¼±µÇ°Å³ª À¯ÁöµÇ´Â °ÍÀ» È®ÀÎÇÏ¿´´Ù. º» ³í¹®¿¡¼ Á¦¾ÈÇÏ´Â ¹æ¹ýÀº ºÒ±ÕÇü µ¥ÀÌÅÍ ¹®Á¦¸¦ ÇØ°áÇÒ ¼ö ÀÖ´Â »õ·Î¿î ¹æ¹ýÀ¸·Î½áÀÇ °¡´É¼ºÀ» º¸¿©ÁÖ¾ú´Ù. |
¿µ¹®³»¿ë (English Abstract) |
Machine Learning is useful due to its good performance and application in various fields such as data classification, voice recognition and predictive models. However, there exists a problem regarding the imbalance between classes in the training dataset, which degrades the classification performance of the minority class. In this paper, we propose a new data augmentation method that combines the Balancing GAN and Tomek Links Method to solve the Imbalanced Data problem and find a clear decision boundary. To verity the proposed method, we have evaluated the performance according to the classification model using five datasets. Moreover, the performance has been compared with Data Sampling and GAN based Data Augmentation Techniques. The results showed that the classification performance was improved or maintained by 0.05¢¦0.195 in 17 of the total 25 performance evaluations. The method proposed in this paper showed the potential as a new method to solve the Imbalanced Data problem. |
Å°¿öµå(Keyword) |
ºÒ±ÕÇü µ¥ÀÌÅÍ
Tomek Links ¹æ¹ý
Ç¥ µ¥ÀÌÅÍ ¼¼Æ®
¼öÄ¡ µ¥ÀÌÅÍ ¼¼Æ®
imbalanced data
balancing GAN
Tomek Links method
tabular dataset
numerical dataset
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|